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PREFACE 



This report consists of four parts. The first part is a non-technical 
summary of the basic problem and our attempted solution, vncitten by 
Charles Kadusliin. Tlie second part is a technical review of the 
literature and a description of the basic algorithm used in our 
solution, and was written by Richard Alba and Charles Kadushin. The 
third part describes the use of the Sociogram System and v;as vn^itten 
by Richard Alba, chief programmer for this project, who has also been 
responsible for developing our basic algorithm. The fourth part 
describes the use of CIIIHI, a program for discovering sociometric 
linkages, and v;as vrritten by Peter Abrams and Richard Rosen. Although 
the various sections of the report have been the particular 
resTponsibility of the persons given credit for them, all of us have 
participated in extensive discussion on all phases of the work. 
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SOCIOIOTRIC CLiCiUS IDEIfPIFICATIOM 



rraODUCTION 



Sociomctry 



Socioitietry is a method for ascertaining the relationship between 
units. Usually, the presence or absence of a relationship of one unit 
to all other units is mapped. Sometimes the relationship mapped is one 
of degree^ Typical relations that have been studied are the liking or 
disliking of people for each other, the admiration in which they hold 
each other, and with v*om they talk, communicate, or otherr^ise interact. 
Although the units studied have most often been people, esjjecially 
school children, the method is obviously not limited to individuals. 
Relationships between different organizations or parts of an organisation, 
as v/ell as bet^^reen cities and countries, have also been studied. 

Once the relationship between all the units under study has been 
ascertained, a graph called a sociogram displays the netw’ork of relations. 
The familiar road map is one kind of sociogram., for it displays the 
relationship between cities or other points on a map. The line which 
connects the cities is proportionate in length to the distance betvreen 
them. A map or sociograra not only gives an instant visual 5mage of the 
network between units, but also can suggest clusterings of the units into 
cliques or regions. Further, the network displays an individual unit not 
only as having direct relations with other units, but also as having in- 
direct relations via other units. 

The collection of systenatic data on networks of relations bet\-reen 
units allows for many mathematical operations that can enhance our under- 
standing of the network (Abelson, 1966). The simplest operations allow 
for the location of "stars” or "sinks”. These are units into which many 
relationships flow. The most popular child in a class is a star. Or one 
may compute the reverse, that is, units which have many relationships 
which lead away from them. An effective social climber is one kind of 
such unit, for he seeks out many persons. Some units may have many direct 
relations with others, wh5JLe some units ray have many indirect relations. 
For example, a person may have a great many friends, none of whom has much 
consequence or power. Another person may have one good friend, but that 
friend may know many other well connected people who, in turn, are vfell 
connected. Thus, the ratio of direct to .indirect links for each individual 
may be computed. Tliese are but some of the many operations that can be 
performed on networks. 

Since sociology, social psychology and the study of educational 
organizations always imply the study of social relations, it would seem 
that sociometry woiild be a widely used tool. Because of the ease with 
which data can be collected in school systems, education studies are 
probably the largest single source of sociometric data. In fact, socio- 
metry was invented to deal with the problems of a corrective Institution 
for adolescent girls. 
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Despite the apparent power of socionier.rj’- to deal with problems of 
(iroup reiationc, the method has heen a tool ahead of its time. The vast 
amount of cociometric data which has been co3.1ected has rarely been 
properly analyzed. It is impossible to hand draw socio/jrams of groups 
much 3.aroor than 25 individual units. In fact, most laroer sized socio- 
grams that we have inspected and compared, v/hen possible, wit)i the 
originai data, contain at least one error. Hot only is it iirioossible to 
draw sociograms by hand, but it is equally impossible to assirn units to 
cliques or regions. The Juggling involved seems too difficult for an 
unaided h\anan. Because of these very practical difficulties, sociometry 
has never been able to fulfill its promise. 



There is a long history of attempts to r.o3.ve the prob].em of 
clustering. Beginning in 19^9 (Luce and Perry), various matrix algebra 
techniques v/ere introduced in attempts to simplify the clique formation 
problem through mathematical reduction of the data. As late as 196^ 
(iferary, TTorman and Cart>n:iglit), attempts v;cre made to aj;iply graph theory 
to sociometry, since the sociogram itself is one kind of graph. For 
technical reasons vrhich Spilerman (1966) and a\belson (].9'56) give, and 
which are elucidated in ;\iba and Kadushin (1970) which foims part of this 
report, sheer mathematical methods are siimply not able to cope with the 
complexity of sociometric data. Some iterative algorithm (a rule of 
procedure which is repeated over and over again with each approximation 
to the answer somewhat better than the previous one) might be more helpful. 



jS’/en before high speed digital computers with fairly large memories 
were developed, various investigators proposed several algorithms with 
which card sorting accounting machines could rearrange sociometric data so 
that cliques Vijight be njore readily seen. A recent approacli is Spilerman *s 
(1966) method, which uses a large, high speed digital computer. But even 
this modern attempt is disappointing. It is effective only if there are 
relatively few connections between units. Moreover, the coiirouter program 
itself has several ’’bugs". The most serious problem, of which the author 
of the program was unavrare, is the fact that different data arrangements 
wouid be produced depending on where in the data the program started to 
operate. 



Toward a Solution of the Sociometric Problem 



It is safe to say that at the time we began our work the sociometric 
problem still defied solution, nonetheless, the basic tools for solving 
the problem seemed, for the first time in 30 years, to be lying about 
merely vraiting for someone to put them together. To begin with, Columbia 
University had Just acquired one of the world’s largest computers (an IBM 
todel 360-91 coupled to a 3oO-75)> as well as a data plotter. 1-iore 
important, recent work in non-matric scaling and in numerical taxonomy 
(see Green, et al, 1968 for a relatively non- technical review) suggested 
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that rr;any of the clvistering and display prot.leins thr-.t liave proved so 
irksorne to social science vrere in the process of being solved. Today, 
in 1970> the entire fie3.d of non-rnetric clustering is less than ten 
years old. V/e thought that cdaistering iprograrns could be enlarged \rith 
the aid of our neu maiTiirioth corar)uter and adapted to sociojrietric analysis. 

There v;ere several hurdles in the v;ay. The most crucial problem 
v;as the fact that most sociometric data v/as, and still is, collected as 
dichotomous data; that is, a unit cither is or is not related to another 
unit. IIo known clustering technique can work v'ith such data - in fact, 
this v/as the main problem in previous attempts to cluster sociometric 
data, even before the advent of nev; non-metric techniques. Our basic 
insight into this barrier came from the theory of social circles (Kadushi)i, 
3-9*58). i*Iexthor in ac c\ia3. life nor in ai huicial sociograms is a unit 
merely related to other units d. rectly. The essence of sociometric 
analysis is that it shoves that units may be indirectly related to other 
units. A is related to B who, in turn, is related to C. We had pre- 
viously argued that this simple fact is a key to the way modern social 
systems operate. But this fact coiild also be used to create a measure of 
relatedness betvreen pairs of units so that these units could then be 
grouped with some of the nev? approaches to clustering. A is one step away 
from B and two steps away from C. In fact, several previous mathematical 
analyses of sociograms had pointed this out but v;ere not able to connect 
it to clustering techniques, because those clustering techniques had not 
yet been invented. 



In order to make the best possible use of the .idea of the number of 
steps from one unit to another, v;e investigated a computer prograjn 
developed by James Coleman (l95h ) which was designed to chovr hcv 7 many 
persons in a group v;ere "ultjmately", that is, even through a very long 
indirect chain, connected to all other members of the group. Thus, instead 
of sho\7ing no connection between A and C, the program vrould show a connec- 
tion if they v/ere both directly connected to B. The measure of "connected- 
ness" he developed v:as simply the number of actual connections divided by 
the number of possible connections for any given group. The program was 
also able to shov; the shortest niunber of steps from one unit to another. 
Further, the program gave a "printout" of all units which were even 
remotely connected to a starting unit. This printout is a very useful 
research tool in "debugging" sociometric data. The only problem with the 
program was that it did not work, and contained errors in calculations even 
when the programming errors v;ere straightened out! 



Our first job was to correct Coleman's program and mal^e it generally 
available for third generation computers (it had been written for a second 
generation computer). C?L\IW (see page 46 following) describes this program. 
Those familiar with Coleman's program, as printed in his book (1954), will 
note a number of nev; features vfhich add to the program's flexibility, both 
in data handling and data analysis. Together with RENUM (pages 27-31) > 
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alinost any kind, of 5 .nput data can "be handled and the basic part of the 
program al 3 .ov;s for the generation of data that can be used for a large 
variety'- of sociometric data analysis puri>oses. 






Our original pm'pose in developing CllVITT vras to produce a matri?: 
vhich gave the shortest path from one unit to another. Tiiis matrix v/as 
to bo clustered with the aid of Johnson's hierarchical clustering 
program (pages This program clusters all units ’.chich are not 

more than a given distance away from one another (Diameter method). 
Thus, all units which are one step away from each other are clustered, 
then all units v/hich are one and tv;o units. av;ay from each other are 
clustered, then all units v/hich are one, two and three units av/ay fi’om 
each other are clustered, and so on. In addition, in its connectedness 
method, the program groups units v;hich can be reached from a given 
starting unit. Thus, if we start v;ith unit A, a unit which is one step 
av/ay from A is added to the cluster, then a unit v/hich is one step av/ay 
from the unit just added is included, and so on. Doth these methods 
seemed ideal for clustering sociometric "distance matrixes" as produced 
by C?IABI. 



The only problem vras that no clustering method could vrark v/ith the 
data v/hich CJtA.H’r produced. The matrix of shortest distance from one unit 
to another led to many ties. There v/ere many units v;hich v;ere only one 
step av/ay from other units, many which v/ere tvro steps, and so on. Almost 
all clustering programs are confused by a proliferation of ties. Some 
algorithm for relatednesa v/ould have to be found v/hicli did not lead to 
many ties. Farther, it v/as also discovered through the process of labor- 
iously check.ing out the results of clustering the shortest distance mutiix, 
that some method of accurately drav/ing socio grams by computer v/ould have to 
be developed before any further work v/ith clustering techniques. Existing 
hand drav/n sociograms proved to contain errors, so that they could not be 
used to verify computer produced clusters, V/ithout an accurate, easy to 
read sociogram, large numbers of different types of sociometric clusters 
could not be compared v/ith the original data, and for the development of 
effective clustering techniques, it is necessary to v;ork v/ith many 
different tj'pes of soc 3 .ograms. The drawback of previous efforts v/as that 
they v/orked v/ith the data on hand, but proved unable to cope v/ith other 
sociometric data. 



Because of these problems Me started all over. First v/e constructed 
a nev/ algorithm (described in detail in the appendix to Alba and Kadushin, 
pages 19-22 belov/) for measuring relatedness. This algorithm uses much 
more of the data, for it considers not only the shortest paths, but all 
non-redundant paths betv/een any tv/o units. Longer paths count for less, 
hov/ever, than shorter paths. Because so much more of that data is used, 
the chances of ties are quite lov/. In addition, the algorithm is likely 
to locate units in space relative to each other in a v/ay v/hich is much more 
like the original data. 

Kov/ that v/e had a matrix of relatedness v/hich gave an almost infinite 
variety of disto.nces of one unit from another, v/e used a non -metric scaling 
technique v/hich located units in tv/o or more dimensional space such that 
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the ran’i order of the distance betv/een units in t?ie space created by 
the profiraTn ';-;as roui;;h?.y the sar:e as the ranh order of the dictanccs 
betv/een units in the orininal data. T;>e problem is something like being 
given a mileage chart between cities of an unknov/n country and being 
asked to construct a map based on the mileage chart. It sounds easy, 
but once more, this is the sort of problem vdiich liumans are essentially 
incapable of solving on their ov;n. 



liruskal's non-metric scaling program (pages 3^-3B belov/) located 
our units in tv;o dimensional space. It remained only to drav; lines 
betv/een them which, as in conventional sociograms, represent direct 
choices. To do this v;e developed a plotting program vdiich idots the 
points,' labels them, and draws the appropriate lines rejpresenting t)ie 
direct choices. For the first tDjme since sociornetry vras invented, it was 
possible to obtain accurate and objective sociograms. Sample sociograms 
are shown in Alba and Kadushin, pages 10-15, They come from a study of 
opinion makers. Although the purpose of the study -was to develop programs 
for the study of schools, opinion mailer data v.’ere on hand and seemed 
unusually complex. If vre could r^oduce satisfactory sociograms for this 
data, surely we could do so for school rystems. The program has, in fact, 
recently been used for school studies and has proved most satisfactory. 



It is rare in social science that there is an objective goal - either 
one is right or one is vrroiivg. In our case, v/e had a definite goal - to 
produce computer drawn, objective sociograms - something no one had ever 
been able to do before, b’e succeeded. lionetiieless, this report is a half- 
way mark,- The ultimate goals of, first, objective clique identification 
and, second, the production of fully documented computer programs adapt- 
able to different installa'Cions, have not yet neen attained. And yet we 
feel that v/e have solved the problems in principle: that our algorithm for 
finding relatedness betv;een pairs effectively produces a measure that can 
be scaled to produce sociograms, or clustered to produce cliques. We feel 
that \ie have generall.ly identified the class of scaling and non-metric 
clustering techniques that will ’work ’with this algorithm. Our set of 
computer programs to produce sociograms and perfoim rudimentary clique 
analysis is wholly operational at Columbia University and can be used by 
persons v/ith no knowledge of programming. The Columbia University Computer 
Center, which participated in the development of the program, v/ill give 
access to any academic groups interested in using our program. 



In keeping vrith the nature of our progress, then, this report consists 
of three parts. The first is a paper submitted to Sociornetry . The second 
consists of detailed instructions for using the set of programs at the 
Columbia Computer Center. All the programs except the relatedness program 
can be used at most large computer installations. The relatedness program 
v;ill operate only ’with very large IBM computers. The third part of the 
report consists of a program write-up of CRAIN, a program v;hich finds the 
shortest number of steps between two points of a sociograai. This program 
was useful in our early vrork, and represents a correct version of a program 
published by Coleman, 



Despite the iriconrolete nature of our work, it is my feeling as 
principal investigator, that we have Xi/ritten more programs of a more 
sophisticated nature and solved more problems than is usual for projects 
of this size. 
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Abstract 

The solution of the sociometric clique identification problem 
could be f/reatly advanced if an efficient method of generating 
socio grams could be invented. An efficient, computer-based method 
is presented in this paper. This method is composed of two steps. 
In the first, a matrix of pairvrise relatedness is calculated; the 
measure of relatedness depends upon the number and lenfitths of 
paths from one point to another in a directed graph. In the second 
step, a multi-dimensional scaling technique is used to generate the 
configuration of points in space whose interpoint distances best 
monotonically match the measures of pairvrise relatedness. This 
configuration is then displayed on a cathode ray tube plotter and 
the lines are dravm vrhere a relationship exists by virtue of direct 
nomination; the result is interpreted as a sociogram. 




o 

ERIC 



\ 

•f 



i 

t 

/ 



1 

5 



.i 

I 







10 



TIffi COKSTRUCTIOIl OF SOCIOGRAllS 3 Y COl.JHJTFR I-^THODS^ 

Richard D. Alba and Charles Kadurhin 
Coluuiibia University and Teachers College, Columbia University 

Although socioinetry was once associated with a particular psycho- 
logical theory (jvbreno, 193^1 ), it has come to mean any method for 
collecting and analyzing patters of relations betv;een individual 
units. Ti'pically, a sociometric question on a survey asks a respondent 
to name those people v/ho bear a particular relation to him; those he 
likes, those whom he asks advice, and so on. An observation as coded 
for sociometric analysis usually consists .of an identification of the 
respondent together with an identification of the persons to whom he 
is related. 



Sociometry has had wide application outside the study of small 
groups where it started. It has been used to study interlocking boards 
of directors (Guttsman, 1963 )> informal netv/orks among members of an 
elite (Kadushin, 1963 ; Agger, 19 ^ 4 ), relations between community 
organizations (Young and Larson, I965), "invisible colleges" among 
scientists (Flillin, I968; Crane, 1969), and the diffusion of new 
IDractices among physicians (Menzel, Coleman and Katz, 1966). Perhaps 
the most frequent application has been to studies of clique formation 
among school children (Coleman, I961). 



There is a large body of literature, some of vrhich will be presented 
in the foliov/ing section, v/hich addresses the problem of identifying 
cliques and subgroups among individuals in a large group. A clique may 
be "intuitively c 1 efii.ned as a subset of members who are more closely 
identified with one another than they are v;ith the remaining members of 
their group'-’ (Hubbel, I965: 377 )» The identification problem is 
unsolved . 



In very general terms, there are tvro ways by which to handle the 
problem of clique identification. The first is to devise some numerical 
or mathematical rule by v.'hich likely subsets of individuals can be 
identified; ideally this rule can be incorporated in a computer program 
and large bodies of data can be efficiently handled. The second is to 
present the data in a form, such as the sociogram, which is isomorphic 
to the original data and yet which increases the visibility of cohesive 
subgroupings. The researcher may then identify cliques in a manner 
which is intuitively satisfying. 



^This research was supported by a grant to Charles Kadushin by the grants 
program of the Office of Education. The authors v/ish to thanlc Kenneth 
King, Director of the Columbia Computer Center, and Dr. Charles Roberts 
of Bell Laboratories for their help and encouragement. Richard Rosen 
materially advanced the research with some early conceptualization and 
experimentation. The help of Peter Abrams has been invaluable. 
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The difficulty with the first method, that of identifying subsets 
of individuals by a numeral or mathematical rule, is that the resultant 
cliques are not necessarily the most intuitively satisfying ones. The 
difficulty with the second method is that sociograms must be constructed 
by hand and, for groups of any size, such constructions are tedious at 
best. 

If an efficient means of generating sociograras could be invented, 
then the problems of validating mathematically-based methods of clique 
identification would be considerably eased. Spilerman*s comments are 
germane? 

One way by which the researcher could investigate the 
biases inherent in a prespecifying routine would be to 
compare its decisions v;ith his perceptions regarding the 
structural components present in the data. For example, 
he might construct the sociogram for a portion of the 
group and select which groupings he wants identified as 
cliques, which definitely are not cliques, and which 
groupings appear ambiguous. Then, any objective pro- 
cedure which agrees with his first two decisions could 
be allovred to classify the ambiguous groupings and to 
process the remaining data with a reasonable assurance 
that it will not grossly transgress his perceptions. 

For this reason it seems likely that the provision of 
efficient means for representing group structure will 
facilitate the empirical validation and adoption of 
more sophisticated mathematical techniques. (Spilerman, 

1966: 313) 

This paper will present an efficient, computer-based method of 
generating sociograras. The authors believe that they have produced 
the first con^puter-dra^-m sociograras. The reader should be warned 
that the programs which have produced these sociograms are not yet 
ready for general distribution. However, the algorithm and method 
are presented in detail in this paper. 

Previous Work on Clique Identification 



In the previous section, we mentioned two possible approaches 
to clique identification: the first is that of mathematical 
techniques; the second is that of isomorphic presentation, such as 
the sociogram. 



A sociogram is a pictorial representation composed of points 
and lines connecting points where each point represents an individual 
and a line, which may have an arrow to indicate direction, connects 
two points if the individuals represented by those points are related 
by virtue of one having chosen the other or each having chosen the 
other. 
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VJhile a socioBram is one kind of representation of socioinetric 
data,, there are tvro other kinds v/hich are important ‘because they 
connect socioraetry with powerful sets of mathematical theory. The 
first of these is the directed craph; the theory of directed graphs 
is a formal vray of talkiiifr about some jjnportant features of networks,., 

A directed graph, itself, is best visualized as a directed picture ■’ 
such as a sociogrom. It is defined formally as a set of points 
together with a set of ordered pairs, v7hose elements belong to the 
set of points. The second representation of sociometric data is the 
adjacency matrix A. The elements of the adjacency matrix indicate 
direct relationship or lack of it; the entrj'' aij , the entry in the 
ith row-and the jth column of A, is 1 if individual i names, or is 
directly related to, individual j and is 0 otherv;ise. We can there- 
fore bring to bear on the problems of sociometric analysis two power- 
ful sets of mathematical theories} those of matrix algebra and those 
of directed graphs. 

Nearly all mathematical techniques of clique identification have 
dra’fm from one or both of these sets of theory. One such group of 
techniques is composed of those in which a formal definition of a 
clique is presented; such a formal definition is usually stated in 
terms of the properties of adjacency matrices, matrices derivable 
from adjacency matrices, or directed graphs. 

Foremost among the methods which use a formal definition of 
cliques are those presented "in Luce and Perry (19^9) and in Harary, 
Norman, and Cart\vTight (19^5 ). Both methods appeal to concepts in 
directed graph theory to define a clique and then use operations on 
adjacency matrices to locate subsets of the entire group which 
satisiy the definition of a clique, (it should be noted that Harary, 
et.al., never actually use the term "clique" but their treatment of 
certain topics in directed graph theory, such as strong components, 
implies a definition. ) 

A concept from directed graph theory which is fundamental to any 
approach to clique identification is that of reachability. Intuitively, 
one individual is reachable from a second if there exists some chain 
of individuals by which the second can communicate to the first; for 
example, if a talks to b, and b talks to c, then c is reachable from a. 
While reachability is a property of points in a directed graph or 
individuals in a network, the reachability relations of a directed 
graph can be determined from the various powers of the adjacency matrix 
isomorphic to it. Non-zero elements in the square of the adjacency 
matrix, to take an example, occur only for individuals or points which 
are connected by a chain of two links. 

Luce and Perry (19^9) propose using the third power of the adjacency 
matrix to identify subsets of individuals vrhich might be cliques. As 
Spilerman (19'56) points out, this method may fail to produce common 
sense cliques. According to Abelson (i960), there is no reason to think 
that chains of any given length, (say, 3) are more indicative of cliques 
than chains of any other lenj^h. 

5 
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The concept of strong component, as presented in Ilarary, Norman, 
and Cartwripht (19 'j 5)> seems appealing as a possible definition of 
clique. individuals are in the same sti’ong component if each is 

reachable from the other, Wlien we condense a direeted r^raph by 
replacinfT each of its strong components by a sin^ie point, the resulting 
condensed directed graph preserves the essential rcachai)ility features 
of the original. The problem seems the same: if we take strong com- 
ponents to be cliques, the resultant cliques may not have great intuitive 
appeal. I'foreover, when the relation is symmetric, such as "talk vfith," 
entire networks may disappear i.nto a single jH^int when condensed. 

Other writers have presented ways of manipulating the data so that 
the cliques become more visible to the eye, even though the data is 
presented in a form isomorphic to the original, that is, v:ith all the 
detail of the original. Coleman and KacRae (l9»1o) present a method for 
reordering the adjacency matrix in such a \ja.y that cliques and cohes3.ve 
subgroups appear as clusters or clumps along the diar?pnal. As Spilerman 
( 1966 ) points out, these clusterings appear only under ideal conditions. 
For reasonably complex networks, the clique clusters along the diagonal 
have "octopus arms" extending out from them which confound the inter- 
pretation of results. 

Spilerman ( 1966 ) attempts to combine the best features of the 
Coleman and MacRae method with those of sociogrammatic presentation. He 
X^resents a method to facilitate the construction of a sociogram from the 
adjacency matrix. His method generates a series of linear chains from 
the adjacency matrix; these linear chains arc a decomposition of the 
sociofiram which may be constructed by hand from them. While his method 
would seem to work well V7ith stru.ctures wh3ch are basically chains (such 
as the example from the Adolescent Societ y in Figure 2), it is not clear 
that it vrould yield satisfactory results with structures which are more 
complex (such as either of the examples from the Yugoslav data). 

Hubbel^s method (19<35) represents an important strand in the body 
of mathematical techniques of clique identification. His method is an 
attein 5 >t to cluster points on the basis of a matrix whose elements measure 
the relatedncss of individuals in a network. 

The first step in bis method is to prepare the matrix of relatedness. 
He starts from a matrix W whose elements measure the degree of immediate 
relatedness between individuals. To measure the degree of indirect 
relatedness, that is, relatedness across chains of some given length, say 
p, the matrix W need only be raised to the pth power. The matrix Y which 
is the sum of M and its power matrices is the matrix of relatedness which 
we will use to cluster the points; each elenent of Y measures the degree 
of total relatedness (total in that it is based on all direct and indirect 
relations) for some pair of points. 

The second step is to cluster the points based upon the values of 
the elements in Y so that clusters of highly interrelated points emerge. 
Hubbel suggests collecting those dyads (or pairs) for which the 
corresponding element of Y is above some arbitrary threshhold value. 
Cliques are then constructed from these dyads 



Of course, the same objection which has been raised against 
other mathematical techniques may be raised against this one: namely, 
the resulting cliques are not necessarily the ones the researcher 
vrould have identified had he been able to visualize the network as 
a sociogram. There is another important question which may be 
raised: Are there more appropriate clustering methods than the one 
used by Iftibbel? 

A Proposal for the Computer Construction of Sociograms 

The method which we are about to describe comes closer to the 
method of liibbel that to any of the other methods we discussed in the 
reviev; of the literature; that is, it is composed of two steps: the 
first is one in which a matrix of relatedness is derived; the second 
is one in which the points are clustered based upon the matrix or 
relateuiiess. It is very important to observe that these two steps are 
the crux of any modem solution to sociometric clique identification. 
It can be shown that data with a large number of ties, especially 
dichotomous data such as choosing a person or not choosing a person, 
cannot be clustered except under special circumstances. A measure of 
relatedness between persons which can assume a relatively large number 
of different values must therefore be developed. Such a measure might 
be derived from respondent's ratings of other persons (Hubbel, I 965 ; 
Abelson, 1956). Because the collection of such data can be quite 
cumbersome in large groups, our approach is to derive the measure from 
simple dichotomous choice data. Once a satisfactory measure of 
relatedness is derived, then a variety of new and powerful numerical 
clustering melhoda become applicable. Further, the most immediately 
visible vrny in which our method differs from other attempts to c].uster 
sociometric data is that, rather than rest with emergent clusters of 
points as cliques, lines are drawn, after clustering, betv/een points 
where a relationship exists by virtue of nomination and the result is 
interpreted as a sociograra. The authors do not contend that their 
particuLl.ar method is a priori the best one, but only that it has 
yielded significant results on bodies of data to which it has been 
applied and thus may point a way to a general class of sociometric 
clique identification methods. 

It seems reasonable to base the measure of relatedness on the 
graph properties of the networ’:; that is, the relatedness of any pair 
of points is measured by the number of chains between the points and 
the lengths, or numbers of links, in these chains. Thus, we could •• 
count the chains or paths between the points, weight each chain 
according to length so that the longer chains count less, and then 
sum the weights to arrive at our measure. 




Figure 1 



Thus, in Figure 1, there is a one 1-link chain from A to B (A B), 
one 2-link chain from A to 3 (A C B), and several 3-link chains from 
A to B (one such is A D C B). In one of the 3-link chains from 
A to B, namely, A. B A B, a link appears more than once in the chain; 
such a chain, in which the same link appears more than once, is a 
redundant chain. On the other hand, there is one 1-link chain from 
E to F; and there is one redundant 3-link chain (E F F f). In 
terms of our relatedness measure, then, A is more closely related to 
B than is E to F. 

We have already noted that there is a relation heti'/een the mimher 
of chains hetvreen any two points in a directed <?raph and tha values of 
corresponding elements in the powers of the adjacency matrix; that is, 
we can determine from the number of chains of length p between any 
two points. Moreover, we can express the idea of attenuation, that is, 
that chains of increasing length count for less, by scalar multiplication 
of AP by aP, where a is a positive number less than 1. 

Utilizing these tv/o ideas, we can describe the matrix of related- 
ness, R, as fo3.1ows: 

R « aA + a^A^ + a^A^ + . . . + a^A^ + . . . 

Having described the relatedness measure in terms of matrix operations 
on the adjacency matrix, our measure is easily programmed on a computer. 

One minor difficulty is that in counting chains by iteratively 
raising the adjacency matrix to a3.1 possible powers, we also count 
redundant chains. Redundant chains can be eliminated by utilizing a 
suggestion of Coleman (Coleman, 1964), in vrhich elements of the adjacency 
matrix are set to zero v;hen the links which they represent are used to 
construct a chain, thereby preventing the reuse of these links.* More- 
over, this zeroing out procedure makes the process of finding all 
possible chains a finite one. The exact algorithm we use is described we 
use is described in the appendix. 

. There are two basic ways by which a relatedness matrix may be made 
comprehensible. In one, the points are placed into N dimensional space; 
in the other, the points are ordered into a taxonomy. The first 
corresponds to presenting the matrix as a sociogram; the second to 
placing individuals into cliques. Until very recently, factor analysis 
and discriminant analysis were the only practical techniques for 
approximating either goal. Recently developed munerical techniques 
(Green, Carraone, and Robinson, 1968; Ball, I965) beern better suited, 
however, to the unknown properties of a relatedness measure. The 



*Another idea implied by Colonan as a way of avoiding redundancy is to 
use the length of the shortest chain as the measure of relatedness. This 
method requires no weighting and is intuitively pleasing. Experimentation 
with it suggested, hov/ever, that not enough information is used to form 
adequate graphs or cliques if the data are complex. See Richard Rosen 
and I?eter Abrams, CHAIN, 1970. 
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clustering method vrhich vre use is non-metric multidimensional scali?ig'^ 
(Kruskal, 19^4), v/hich, as its name implies, does not assiune that our 
measure satisfies metric properties. 

Furthermore, a computer-based method for generating sociograms has 
higher priority than a computer-based method for identifidng cliques, 
since the adequacy of the latter can be evaluated by visual comparison 
with the former. Accordingly, the method proposed differs from other 
attempts to cluster socioraetric data in that after the points have been 
placed in two-dimensional space, lines are drawn between points v/here a 
relationship exists by virtue of direct .nomination. The result is 
interpreted as a sociogram. 

Of course, a procedure of placing points randomly in space and 
drawing lines v;here a direct relationship exists v;ould only infrequently 
result in a comprehensible sociogram. For an arbitrary conf ifnirat ion of 
points to result in a satisfactory sociogram, it is necessary (although 
probably not sufficient) that points which are directly related be 
closer in space than points which are not. More generally, it is 
necessary that the interpoint distances reflect the interpoint related- 
ness; that is, the more related tv;o points are, the closer they should 
be in space. 

This last requirement is the rationale for our selection of 
multidimensional scaling as a clustering technique. Multidimensional 
scaling is a numerical technique which, given a matrix of similarity 
(or dissimilarity) measurements, constructs the configuration of points 
in space whose matrix of interpoint distances comes closest to monotoni- 
cally matching the matrix of similarities. If we take the matrix of 
similarities to be our matrix of relatedness, then the more similar or 
related tv;o points are, the closer we would expect them to appear in 
space after scaling. 

To generate sociograms the points are first separated into disjoint 
sets, that is, sets such that no point in a set is related to any point 
in another set. Out programs then calculate a relatedness matrix for 
each set based upon a symmetric ad,jacency matrix. The points of each 
set are scaled into two-dimensional space and the resultant configuration 
is displayed on a cathode ray tube plotter and lines are drawn by the 
plotting program between points where appropriate. Some of the results 
are shown below. Figure 4 is a complete reproduction of a hand drawn 
previously published sociogram (Figure 2) which is a "chain". Only 
those points which are connected to each other in the large chain are 
shown, since the remaining five sets of units are trivial. The chart in 
Figure 3 lists the correspondence between the numbers in Coleman *s 
original chart and the consecutive renumbering performed by our program. 



[ Vfish to thank J.B. Kruskal, S.C. Johnson, and M. Wish of Bell 

I Laboratories for making MDSCAL IV available to us and for consulting 

I ' with us on various problems on non-metric techniques. 
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The points on the computer drav/n sociotrrara are indicated hy a small 
dot on the lines ;7ith the identification number adjacent. In this 
scfJLe, the points are better seen by noting inflections of the lines. 

In this scale, the tight clusters of 58;. 55; 59 j etc. (in the 

original nu::ibers) in. compressed beyond legibility in an 8 x 3.1 form-at, 
a problem that can in part be solved by "blowing up" the diagram (the 
program itself can do this) and in part by a method to be discussed 
shortly. It should also be noted that vje made an error in transcribing 
the original data to punch cards. This is shovm by the extra point 17 
in the computer dravm d.lagram. Vie have preserved this error because it 
demonstrates our original reason for producing computer dravm sociograms ; 
in every case of la.rge sociograms in v;hich we possessed the original 
punch card data and the hand dravm sociof-ra.m of the original investigator, 
we discovered at 3.east one discrepancy betv;een the punch card and the 
original sociogram. 



Figure 5 represents the reported interaction connections between 
members of the two major Federal Yugoslav Legislative bodies in 1968, 
and is introdiiced to shov; the utility of our approach for large scale 
open system sociometric analyses. Again, the sociogram only shows the 
largest set of \iltimately connected members. Analysis of the sociograrn 
will be reported elsewhere (Barton, Denitch and.Kadushin, 1970). It 
may be noted here, however, that the central cluster around members 9> 
21, 48, etc. includes the chief fonna3. officers of the legislative 
bodies, those whom experts had independently rated as key figures, as 
we.1.1 as a key figure in the Yugoslav League (Party) of Commvnists. The 
quadrants are actually formed by members of different Yugoslav 
nationalities. 



By now it should be apparent that certain features of the sociograms 
produced by our method differ from those which have been hand drawn. In 
traditional sociograms the length of lines connecting points is entirely 
arbitrary and generally dependent only on the requirements of demonstrating 
the formation of cliques. In our algoritlmi, the relatedness betv/een two 
points depends on the n\miber of other points they are connected v;ith, the 
number, of paths to these points, and the mmiber of points in the chain 
bet\v*een points. To the extent that the MDSCAL algorithm is able to 
preserve the relative distance between points as represented by the v;ay 
we measure the relatedness of points, to that extent the length of lines 
has some meaning. So in general, the short lines in the middle of the 
sociograrn in Figure 5 suggest that the average degree of relatedness 
between the points in the middle is higher than' the relatedness bet\;een 
points at the periphery. The large loops, say 44,33,24,28,39>11>®>51>5, 

4l, and back to 44, are caused by the fact that to get from any one point 
on the loop to any other point one generally has to go through a large 
number of other points, and hence the relatedness is generally low and 
the "distance" from any point to any other is relatively great, so the 
size of the loop on the graph is quite large. This feature also tends 
to force so-called sociometric stars toward the center of the graph. 

Thus, if there appear to be several c3.iques in a graph their starts will 
not be at the center of each clique, but will appear more toward the 
center of the entire graph. This feature is caused by the tendency of 



the multidimensional scaling alrrorithm to locate the zero point of the 
coordinate system at the center of gravity of the confiffuration. 
Similarly, points which have fev; connections to anyone "important" are 
"exploded" to the periphery of the graph. 

Eelative to the points on the periphery, the points which are 
highly related may be overly compressed. To see hov; this compression 
may happen, let us suppose that we have noted a configuration for all 
points but one (the scaling program does not actually v7ork this way). 
The relatedness measures of this point to all other points will serve 
as a series of constraints forcing this point to a unique location. Vfe 
can visualize these constraints as an array of forces acting in many 
directions on the point; if the relatedness matrix is completely con- 
sistent vrith the tvra-dimensional configuration, then the point will be 
moved by these forces until a spot is reached in v;hich they are all 
zero; in general, hovrever, this consistency will not hold and the point 
vrill be moved until a minimum point is reached where the sum of the 
forces is a minimum and the forces in one direction are offset by the 
forces in the opposite direction, as well. 



To visualize how a sociogram produced by this method may be dis- 
torted hy inconsistency bet^ireen the relatedness matrix and t\‘70- dimensional 
space, let us examine Figure 6, which portrays the connected points of 
Yugoslav mass organization leaders. As might be expected from the history 
of Yugoslavia, this group is unusually dense. Hence in this figure, the 
part of. the sociogram marked with a dotted line is compressed beyond 
readability. 



The genesis of the probleiu can perhaps most easily be visualized by 
imagining the points in the compressed area as wanting to push the long 
arms out to the right, while the relatedness of these arms to the 
structure on the left tends to pull the arms to the left. In this push 
and pull, the wei^t of the arms is sufficient to ovem/helra and compress 
part of the structure. !Di this case, a solution to the problem is to set 
the relatedness measures of the arms to the remaining structure to zero, 
allowing the arms to "float free" (of course, a few of these relatedness 
measures are preserved so that the general orientations of the arms 
remain). The resulting sociogram is shown in Figure 7» 



Conclusion 



While a number of problems remain to be solved, it appears that 
new numerical methods along with a re-evaluation of the meaning of 
sociometric relatedness now promise a solution to the problon of finding 
intuitively satisftdng cliques in large sociograras. Indeed, the size of 
the sociograms is limited only by the ingenuity of prooxaramers and the 
size of the research budget (at present, big matrices are costly to 
handle). It seems to us that the following steps in the development of 
the study of informal structures of society are now all but inevitable: 
(l) solution of the "compression" problem; (2) the graphing of large 
numbers of sociograms in a wide body of data; (3) the generation of more 
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efficient techniques for such praphinp, includin/:; the exploration of 
the effect of various algorithms for finding, relatedness; (4) the 
generation of adequate numerical hierarchical clustering methods for 
sociograrns and their testing on wide ."bodies of data; (5) the develop- 
ment of better glpbal descriptions of large sociograms, that is, the 
development of tsrnologies of sociograrns; and most important (6) the 
development of theories of informal structures v/hich relate typologies 
of structure to other relevant social facts such as typologies of 
formal structures . 



Ar>pendi>:. The details of the algoritlun 



Although 



the distances betvreen points in a 



SOClOgTi 



Jii have 



not been 



defined as r.ieaningful, vre start from the recognition that any pictorial 
representation involves constructing a configuration of points in a metric 



space, and hence distances emerge as a bjTJroduct. Kiis recognition led to 
the generating idea of our technique: the use' of non-metric imilti-dimensional 
scaling to generate a configuration of points in the plane. 

Having decided to use multi-dimensional scaling to construct the con- 
figuration we will interpret as a sociograra, vie must define some appropriate 
similarity or dissimilarity measure. It seems reasonable to base this measure 
on the number of nonredundant chains^!- between points and the lengths, or 
numbers of links, in these chains. 
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Figure 8 




Several minor difficulties should be noted. Figure 8 shov/s tv;o socio grams 
which require the same configuration of xx>ints in space. In sociogram A, all 
relationships are unreciprocated or asymmetric; moreover, there are no chains 
between any two of the points: 2, 3> 4, 5* Therefore, our similarity measure 
is zero for any pair of points dravm from this set; in other v;ords, we have no 
information as to hov/ these i^oints should be arranged in space. In sociogram B, 
hovrcver, all relationshix)s are reciprocated. Examination shovrs that our 
measure is non-zero for any pair of j)oints in the sociogram. Comparison of 




^,'Jc use the terrii ’’chain" v.’here Harary, Ilorman, and Cartwright (19^5 ) use the 
term ’’sequence’’. Our notion of nonredundant chain corresponds loosely to their 
notion of path, although our nonredundant chain is more inclucivo than their 
path, that is, not every nonredundant ch.ain is a path, ;;hlLc every path is a 
nonredundant chain. Luce and Perry (1.940) use the” term ’’clnain" in a way v:hich 
corresponds to our use. In their paT>er, the tenn ’’nonredundant chain" corresoond; 
exactly to ’’path’’, however. 19 
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these two sociognuns suggests that we obtain more information for the purix>se 
of locating points in space when v;e reciprocate all direct relations. In the 
following discussion we vrill assume that we are vrorking with a '‘sjnranetricized" 
relation. 

In addition to symmetricizing the data, we partition it into disjoint 
subsets such that: no point in any set is reachable from any point in any 
other set; and any point in any set is reachable from every other point in that 
set. If v/e did not partition the set in this manner, it is quite likely that 
there would be at lea:.;; tv/o points which were not reachable from each other; 
then, the siiflilarity measures would be zero for these points and we vrauld have 
no information about the distances between the points in space. 

Me no\i present certain knovm results for directed graphs (Coleman, 19^4; 

(n) 

Harary, Ilorman and Cartvrriglit, If A is the adjacency matrix, and 

(n' 

is the element in rovr i and colujmi j of A , the nth pov/er of A, then is the 
mnrnber of chains of length n from i to j; that is, we can determine the chains 
of length n between tvro points from the nth pov/er of A. In counting by this 
method we include redundant chains as v/ell as nonredundant ones. The inclusion 
of redundant chains stems from the syrfjr,etry of the adjacency matri::. For any 
tv/o directly connected points, i and j, say, there is a chain of length 3i 
namely i — ^ j— v i — >j. In fact, there is a chain of length n, v/here n is any 
odd n\miber, from i to j . This phenomenon on constructing chains by using the 
reverse (j — >i) of the relation just previously used (i — >j) v/e v/ill call 
"doubling back.” 

Coleman presents a method which can be extended to prevent doubling back. 

He suggests zeroing out elements of the adjacency matrix as they are used; this 
zeroing out corresronds to removing a directed re3.ation as seen as it is used 
to construct a chain, tfierebj'' preventing its reuse. ¥e can prevent doubling 
back by zeroing out the syanraetric relation as v;cll as the one used to create 

the chain; that is, v/e zero our a^^^ as v/ell as a^^. 
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Colevia'-r s netlK'd can oc defined a!s f ollov.*.‘3 . He start froK the ?*i x ni 
adjacency wntrib: A. Let " (r^?.-) he a rov; vector of m eleir.ents; r'? . 

U ^ ' J J 1 

v;;lll he the nu)Aher of chains of length n yiq count fro/ri i to j. Let = (a-^^L) 

1 /-.J ^ J. 

be an ni >: in matrix; A^ ' viill result from the adjacency matrix A after the nth 
step froii; i. He define and inductively; 

(1) R-*^^ = (r^'^]); r.-.| " a. that is., Rp'^ is the ith rovi of A; 

A ' t : ^ CLi • i /*!• = O* anrl ^ • = O* 

(i) 

that is, A^^ is A viith its ith rov; and column zeroed out. 

(2) r|"^ = Rp''"^^ 

' 1 X 1 ’ 

(n) , (n) 00 (n-l) (n--l) (n-l) (n-l)v 

A. = (a, ^.j); = 0 xf ry.i eO- 0 or r-j.^ an...'^> 0; 

X ' aJ,! i-,j. i^jjx j,i 0-i;i ' 

and an^.^!.. = otheinrise; t?iat is, results from Ap^"^^ by 

zeroing out thed-ements corresponding to links used in constructing 
chains of length n 3?rori i. 

This method, hoviever, does distort out intuitive notions in some ways. The 
first way is its failure to find "the other path". Let us consider the 5imp3.e 
structure in A of fi5;ure 9* Computing the chains from point 1, we notice that 
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Figure 9 

there are two chains of length 1 from point 1: one from 1 to 2, and one from 
1 to 3» B in figure 9 shows the structure vie are left viith after step 1; the 
broken lines indicate relations which have been zeroed out. At step 2, we find 
two chains of length 2: both from 1 to h. He are nov: left v/ithout any remaining 
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relations, as shown in C of Figure 9* We failed to find the tv7o chains of 
length 3: the one between 1 and 2 and the one between 1 and 3. 



There is yet another viay in which our intuition is distorted; this problem 
we will call ‘'multiplication when paths join." Consider the structure in 
Figure 10. We note that our method vrould compute t\ro chains, each of lengbh 2, 
from 1 to 4 ; it vrould also compute two chains, one of length 1 and the other of 
length 2 , from 2 to 4 . l#iile our method would find two chains from 1 to 5 , it 
would only find one chain from 2 to because the chains from 2 to 4 are of 
different lengths. 



aA3 

Figure 10 

There is a v/ay of partially correcting for this problem. We noted that 

rj”^ is the number of chains of length n v/hich we count from i to j; r^!?| is 

fn) 

an element in the rov? vector which is used to coMpute the chains of length 

n+1 from i. Let us. define a row vector that = 1 if 

0; that is the jth element of is 1 if there are one or more chains 

fn) 

from i to j of length n. Then we use ' to compute the chains of length 

(n+l) (n) (n) 

n+1 from i; that is, , 

Let us now define the similarity measure for any two points i and j. If 

the reader will remember, the similarity measure should be a function of the 

number of nonredundant chains between i and j and the lengths of these chains, 
(n) 

r. . is the number of nonredundant chains from i to j as best we can count them. 
0 

To control for length we introduce an attenuation constant, a, where 0<a<l. We 

^ n (n) 

then define s^^. = a where K is the step at which the process is 

-H- 

exhausted. Since s.^ is not necessarily equal to s.., we define the similarity 

measure as: = Sy + s^‘^ 

2"^ . 
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The Sociogram System 
A User’s Manual 



Richard Mb a 



June 1970 



Teachers College 
and 

Bureau of Applied Social Research 



Columbia University 



User j Manual for the Sociogram System 



The General Flov? of the System 

The first step in the procedure is to partition the population into 
disjoint subsets, each one of which will become a sociogram. 

The poijulation is renumbered by RElO-l, yielding a set of observations 
whose id's are numbered sequentially from 1. The renumbered population 
is then input to the connectivity program (C0ISCT2) which computes a 
relatedness matrix and an adjacency matrix. Either of these is then input 
to the Hierarchical Clustering program (HI5R), from whose output the 
disjoint subsets may be read. 

A particular subset is then renumbered once again and relatedness and 
adjacency matrices are computed. The relatedness matrix is then scaled by 
MDSCAL into two dimensional space and the optimum configuration is punched 
out. This configuration together v^ith the adjacency matrix is then input 
to the plotting program (PLOT) which produces an output v;hich may be 
plotted on the Uo60, 
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The Renuinbei'ing Program (RENUM) — version 3 

The purpose of the renumbering program is to renumber a body of 
sociomctric data so tliat the respondent id's proceed in sequence, start- 
ing from 1. 

The program allows the user to delete individuals from the sample 
or add to the sample individuals who are named by someone in the sample 
but are not themselves represented by an observation in the sample. In 
the case where individuals are deleted from the sample, all references 
to these individuals by others in the sample are deleted. Individuals 
are deleted from the sample because their id appears on a list, called 
the acceptance/rejection list, supplied by the user (or fails to appear) 
or because they are named too few times by others in the sample (the 
number of times they must be named to be retained is a parameter to the 
program) . 

The third version of the renumbering program allows jobs to be 
batched; that is, the user may renumber as many different bodies of 
sociometric data as he wishes in one submission. 

control cards 

1. the parameter card 

cols 1-5: NE — the exact number of people on the list of 

those to be accepted or rejected; the parameter REJ determines 
whether these people will be accepted or rejected; leave this 
field black or code a 0 if there is no such list; 

cols 6-9: NC — the maximum number of choices per observa- 
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tion; 



cols 10~13: ICONS — the number of times an individual 

presently represented by an observation in the sample must be 
chosen to be retained in the sample; 

cols 14-17: KONSl ~ the number of times an individual 

named but outside the sample must be chosen to be represented in 
the sample; note that if this field is non-zero and an individual 
outside the sample fails to be named a sufficient number of 
times, then all references to this individual v/ill be deleted in 
the output; that if this field is zero then all individuals who 
are named but outside the sample are represented by observations 
in the output; that if tliis field is non-zero and an individual 
outside the sample is named a sufficient number of times, then 
this individual is represented by an observation in the output. 

If this parameter is greater than the number of people 
in the sample, then all references to individuals outside the 
sample will be deleted in the output. 

col 13: REJ — code a 0 if the people on the acceptance/ 

rejection list are to be rejected; code a 1 if anyone not on the 
list is to be rejected; if there is no list (NE is 0), then the 
contents of this field are not used; 

col 19: SMPL — code a 0 if there is to be no simplicatlon 

through the elimination of duplicate choices; code a 1 if sim- 
plification is to take place; this simplification occurs ’ .len 
one person is chosen twice by another~in this case, one choice 
is eliminated. 
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2 . ■ the input format card 

This card contains a FORTRAN format statement which is used 
to read the data records. Only integer format items should be used. 
The first item in the format list corresponds to the respondent's 
id. All subsequent items correspond to the choices of the respon- 
dent . 

The first item of the format statement is also used to read 
the acceptance/rejection list if there is any; that is to say, each 
record in the acceptance/rejection list contains one item of the 
list and that item is located in the same position in the record as 
the respondent's id is located in the data record. 

3. the output foi'mat card 

This card contains a FORTRAN format statement v/hich is used 
to write the output records; these records are written on the file 
whose ddname is FT07F001 and are, thus, normally punched. Each item 
in this format statement is an integer item. Each output record 
will contain the following information; the old identification 
number of the respondent, the new renumbered identification number, 
and the renumbered identification numbers of this individual's 
choices, if any. 

4. the title card 

The title card may contain any alphameric information. 
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the deck setup 



The ordex’ of control cards to process any deck of socioraetric 
data is as follows: 

title card 
input format 
output format 
parameter card 

acceptance/rejection listi if NE is non-zero 
data. 

At Columbia, the following is the Job setup which is required 
to run the program: 

/♦SETUP DEVICE=2311 , ID=MDK004 

//jOBLIB DD DSN=SYS5.S0CI0,DISP=(SHR,PASS) , 

// UNITs2311,VOL=:SER=:MDK004 
//A EXEC PGM=RENUM,TIME=n 
//^T06F001 DD SYSOUT=A 
//PT07F001 DD SYSOUTsB 
//FT05F001 DD * 

control cards and data 

/* 

multiple submission 

The order of control cards and data for the processing of multi 
pie decks in one job submission is as follows: 



i 



• • « 4 «. 



♦ 

i 

I 

f 

t 

1 




4 




•c‘B8'iooao*Bina*a«8 — <8*>v> ia — ic woAam* oon« 



H 



I 

I 



♦ 



♦ 




ERIC 















ure 6 



Job 1 s control cards and data 
blank observation 
Job 2 ss control cards and data 
blank observation 



Job n~l s control cards and data 
blank observation 
Job n 

The output decks are separated, in a multiple deck submission, 
by a separator card in which punches 12,1, and 9 occur in every column, 

restrictions 

1. Before any deletion, the total number of individuals in the 
sample plus the total number of individuals named but not in the sample 
must not exceed 3000. 

2, NC must not exceed 15, If NC exceeds 9, however, the printed 
output may not be correct (although the punched output will be correct). 
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Connectivity Program 

There are two versions of the connectivity program. The first 
version counts all paths; the second counts distinct paths. As a 
general rule, the second is probably the more useful. 

The parameter cards for each version are the same and follow: 

(1) the card which specifies the number of individuals; it is 
of the form: 

N s int, where int is an integer, 
for example: 

N ss 87 

(2) the card which specifies the attenuation constant; it is 
of the form: 

ALPHA ss real, where real is a real number, 
for example; 

ALPHA s: .50000 . 

(3) the card which specifies the maximum number of iterations 
to be performed; it is of the form: 

QMBCA s real, where real is a real nundjer. 
for example: 

OMEGA s .0002 

the cycle n for which (ALPHA)" OMEGA but (ALPHA)"**"^ OMEGA is the 

last to be computed. 

(4) the card which specifies the maximum number of choice; it 
is of the form; 

I *• NC s int, where int, is an integer. 

i for example; 

NC s= 3 
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. These cards appear in the order in which they are stated here* 
Following them are two format cards, the first of which specifies the 
input format, and the second of which specifies the output format. 

The only fields for which provision is made in the input for- 
mat are those which identify choices of the respondent* Bach of these 
is designated by an *1* format specification* 

For example, suppose that there are five two digit fields on a 
card in columns 1 through 10; that the first is the recoded identifi- 
cation of the chooser, the second is the original identification of the 
chooser, the third through the fifth are the recoded identifications 
of the chosen* Then the input format would appear as follows: 

(4X, 312) 

In addition to printed output, there are two putput data sets* 

A connection matrix, a matrix of 0's and I's, where 1 indicates a first* 
level connection, is written on the ddname PT08P001* The format state- 
ment associated with this data set is (25P3*0)* No identification 
number appears in the output record* It is suggested that this data 
set always be stored on disk or tape* 

A connectivity matrix, a matrix of positive real numbers, where 
relative magnitude indicates the degree of relatedness, is written on 
the ddname FT07F001* It is this data set for which the user must pro- 
vide an output format statement* Again, no identification number 
appears on the output record, and so it is suggested that this data set 
always be stored on disk or tape* 
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To summariKe, then, the control cards appear as follows in a 
job setup; 

N = int 
ALPHA = real 
OMEGA a real 
NC a int 
input format 
output format 

data, which is the output from the renumbering program 

To use the connectivity program at Columbia, the following job 

setup is required; 

/♦SETUP DEVICE=2311 , ID=MDK004 
# 

/♦SETUP DEVICE=2311 , ID=DCU032 

//JOBLIB DD DSN=SYSl,M\T400,DISPa(SHR,PASS), 

// UNIT=2311,VOLaSER=DCU032 

//A EXEC PGIIaDFRMrL,TIME=n,PARMa'DCALNG=m’ 

//FT03F001 DD SYSOUTaA ,DCBa (RECF^fc=UA ,BLi{SIZE=133 ) 

//MTLPROG DD DSNaSYSS , SOCIO (C0NNCT2) ,DISPa (SHR ,PASS) , 

// UNITa2311 ,VOLaSERaMDK004 
//FT06001 DD SYSOUTaA 
//PT07F001 DD 

output data set 

//FT0SF001 DD 
//FT01F001 DD ♦ 

control cards and data 

/♦ 
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m, the DCALNG parameter, is calculated as follows: 

m s (20*N )/(1000), where N is the number of individuals. 
Moreover, the REGION parameter on the JOB card is approximately 450+m. 

restriction 

# 

1. The maximum number of choices, NC, may not exceed 20. 
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The Scaling Program (MDSCAL) 



The scaling program (MDSCAL) was developed by Dr, J, B, 
Kruskal of Bell Laboratories and F, J, Carmone, 

strategy 

We use the scaling program to find the configuration of 
points in 2-dimensional space whose distances best fit the related- 
ness or connectivities. The stress measure which is printed by the 
program is a measure of goodness of fit; we seek that configuration 
for which stress is a minimum, 

* 

TO find this particular configuration it is best to repeat 
the scaling several times, each time starting from a random config- 
uration, In our own use of the system, we usually perform five 
scalings of the data, selecting for plotting the one producing the 
minimum stress. It should be noted that as many scalings as desired 
may be performed in one job submission. 

There are normally two job submissions involved in one use of 
MDSCAL: one to locate the best configuration and one to obtain a 

deck containing it. To obtain a deck it is only necessary to place 
the CARDS card in the MDSCAL job (see below) and rerun* the job 
without further change. It should be noted that no changes may be 
made in the jobs precedini^ the one producing the minimum stress if 
it is to produce the same output, 

MDSCAL control cards 

The following cards, punched as shown, constitute an MDSCAL 
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job» AVith exceptions as noted i the statements may be punched start- 
ing in any column, 

DXMMAXs2 , DIMT4IN=:2 

CARDS, if the configuration is desired on cards 
SECONDARY 

RANDOMan.n any intogar 
ITER/>TIONS=100 
CUTOFF=. 0000001 

DATA REGRESSICN=DESCENDING 
title card, title in columns 1-80 

parameter card: cols. 1-3 contain N, the number of points; 

col, 6 contains 1; col, 9 contains 1, 

format statement corresponding to the input data, which is 
the relatedness matrix 

COMPUTE 



Job setup 

/♦SETUP tape containing the relatedness matrix, 
/♦SETUP DEVICE*231 1 , ID=MDK004 

//JOBLIB DD DSN=SYS1.MDSCAL,DISP=(SHR,PASS),UNIT=2311 

// VOL=SER=MDK004 

// EXEC PGM=MDSCALTA,TIME=n 

//FT02F(501 DD UNIT=DISK,SPACE=(80,(25)) 

//FT0GF001 DD SYSOUT=A 
//FT07F001 DD SYSOUT=B 

//FT08F001 DD data set containing relatedness matrix 
//FT05F001 DD ♦ 

37 



45 



. ■ % « • 



MDSCAL job 1 
MDSCAL job N 
STOP 
/He 

The REGION parameter is 400K 
Limitations : 



1. ri may not exceed 100. 



PLOT 



Plot takes the configuration produced by MDSCAL together with 
adjacency matrix and produces from them the plot of the associated 
graph structure. This plot is output by the program in the form of a 
data set on tape which can be plotted by the META processor on the 4060, 
the cathode ray tube plotter, 

PLOT plots each point according to its coordinates in the con- 
figuration and draws lines connecting points where they are indicated 
by the adjacency matrix. 

For legibility, the plot of the sociograra may be blown up over 

a picture which is several pages long as well as the same number of 

pages wide. Additionally, titles may be provided for each page to keep 
* 

track of the output, 
input data sets 

There are three input data sets to PLOT, besides the data set of 
control cards, PT05F001: the data set containing the configuration pro- 
duced by MDSCAL, FT08P001; the data set containing the dictionary of 
identifications to be associated with the points, FT09P001; the data set 
containing the adjacency matrix, FTlOFOOl, 

1) the configuration produced by MDSCAL, 

This data set is normally on cards. The first four cards are 
removed from the data set before inputting it to PLOT, 

The format statement corresponding to this data set should con- 
tain only two fields, each one corresponding to the coordinate of 
one dimension. If the points were scaled into a space higher than 
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two dimensions, only two may be selected for plotting. Each item in 
the format statement should be an F or E item, 

2) the dictionary of point identifications. 

This data set is normally on cards, it may be the data set 
containing the renumbered identification's produced by RENUM, the 
renumbering program. 

Each record normally contains two numeric fields: the first is 

the identification number to appear in the output; the second is 
the present (sequential) identification. These fields occur in the 
order stated. An I format item (in the formal statement) cor- 
responds to each field. 

This. data set is used in the following manner. If the data set 
is empty, each point is identified in the output by its sequential 
identification. Otherwise, the sequential identification of each 
point which has a corresponding record in the dictionary data set 
is replaced by the value of the first field of the record. 

Note: if this data set is empty, there must still be a 

corresponding format and DD statement, 

3) the adjacency or connection matrix. 

This data set is normally on tape, it is produced by the 
connectivity program. 

This matrix is an N x N matrix, where N is the number of points. 

Each entry in this matrix is a 0 or 1; 0 indicates the absence of a 

connection and 1 the presence. 

The format statement corresponding to this data set is normally 
(2SP3.0). 
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control cards 



Note: the control cards appear in the deck in the order in which 

they appear below. 

1) the title card. Ihis card contains information which will 
be placed at the top of each page. 

2) the jobtitle card. The information on this card appears on 
the first page of the 4060 output, which serves as a burst page. 

The fields on this card are: 

cols. l->8. the project number, i.e., the CUCC account number. 

cols. 9-16. the programmer name. 

cols. 17-24. the jobname. any title which is meaningful to the 
user. 

3) the format statement for the configuration from MDSCAL. This 
format statement, as well as all others, must be punched on one card. 

4) the format statement for the adjacency matrix. Normally 
(25F3.0). 

5) the format statement for the dictionary. 

6) the parameter card. The fields ax^i 

cols. 1-3. the nuptber of points in the graph. 

cols. 4-6. the length of the graph, i.e., the number of pages 
long and wide. 
job setup 

At CUCC, the following job setup may be used: 

/★SETUP tape to contain 4060 output 

/★SETUP tape containing adjacency matrix 

/★SETUP DEVICE=2311,ID=MDK004 

4l 
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//JOBLIB DD DSN=SYS5,SOCIO,DISP= (Sim, PASS) ,UNIT=2311,VOL=SER=r.IDK004 
// EXEC PGM=PLOT, TIME=n 

//FT06F001 DD SYSOUT=A (0 indicates a digit zero 

O indicates a letter 0) 

//FT10F001 DD data set containing adjacency matrix 
//SC4060ZZ DD data set containing 4060 output 
//Fr05F001 DD * 
control cards 

/* 

//FT08F001 DD * 

output configuration from KIDSCAL 

/* 

//FT09F001 DD * 
dictionary 

A 

limitations 

1, The number of points may not exceed 1000 

2, The number of connections or relationships may not exceed 3000. 

3, The number of digits in an identification may not exceed 4. 



Hierarchical Clustering (IIIER) 



The hierarchical clustering program was developed and written 
by S» C, Johnson of Bell Labs. 

strategy 

The primary use for the hierarchical clustering program is 
to separate a population into disjoint subsets. This procedui'e is 
as follows. The population is renumbered and a relatedness matrix 
is computed for it. This relatedness matrix is then clustered 
using HIER, Disjoint subsets can be discovered in the printed 
output labeled "Connectedness Method," Each subset will be a con- 
tiguous set of identifications in the printed output which is 
connected at the 0,0 level to every individual not in the subset. 

Additionally the "Diameter Method" may be used for detailed 
sociometric analysis. 

control cards 

The control cards are as follows; they appear in the deck in 
the order in which their descriptions occur below: 

a) Title card. Cols, 1-80 may contain any text, 

b) Parameter card. The fields are as follows: 

cols, 1-3, TBie number of individuals to be clustered, 
cols, 4-5, Punch a -1, 

col, 6, punch a 1 if the data (relatedness matrix) is 
on tape. Otherwise, leave blank. If the data is on tape 
an FT08F001 DD statement must appear, 
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cols. 7-16. A missing data code. The value punched here 
will replace any zero in the rclatedness matrix. Leave blank 
if the zeroes in the relatedness matrix are to remain un- 
changed (the normal situation) . 

c) The format card by which the relatedness matrix is to be 
read. 

data 

The data is normally the relatedness matrix produced by the 
connectivity program. The matrix may be either on cards, in v/hich 
case it is placed right after the control cards in the dock, or on 
tape, in which case there must be an FT08P001 DD statement describ- 
ing it. . 

job setup 

At CUCC the job setup is as follows: 

/♦SETUP tape containing relatedness matrix 

/♦SETUP DEVICE=2311,ID=MDK004 

//JOBLIB DD DSN=SyS5.S0CI0,T)ISP=(SHR,PASS), 

// UNIT=2311,VOL=SER=MDK004 

// EXEC PGM=HIER,TIME=n 

//FT01F001 DD UNI T=DISK,SPACE= (3624, (m)), 

/ / DCB= (RECFM=V, LRECL=3620,BLKSI2E=3624) 

//FTJ^6FJ^jZJl DD SYS0UT*A 

//FT08F001 DD description of the data set containing the 
relatedness matrix. 

//FT05F001 DD ♦ 

control cards and, possibly, data 

/♦ 
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The region parameter on the job card must be calculated as 



62 + buffers + where N is the number of individuals to be 

clustered and ‘buffers* is the space for input-output buffers (as a 
general rule, ISK will be adequate). 



The parameter 'm‘ in the FTOlFOOl DD statement is the next 

integer larger than ^N(N-l) ^ 
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CHAIN 



User's Manuel for a Socionictric Liiil-tage Pro^raiT) 
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and 
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Columbia University 



January, 1970 
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Abstract 



CHAIK is a program v;hich aids in the analysis of socioinetric 
data. In its present stage the progreun analyzes the relation- 
ships between any tv/o elements of a group by constructing all 
possible linkages, or paths, between the tv;o. The result is 
much the same as that produced by algorithms involving matrix 
multiplication but execution time is much faster. Further, 
unlike matrix multiplication algorithms, most of v/hich have r.o 
end point, CHAIN terminates construction of a chain for a given 
element wh.en no new elements are added 'in expanding the socio- 
metric choices. Various matrices, distributions and statistical 
measures are produced during execution of the program. 

CHAIN is written in Fortran IV for and IBid-360 with at least 
250K bytes of storage and is designed to process up to 999 
individuals making a maximum of 19 choices per person. The 
capacity of the program may be increased by enlarging the 
dimensions as long as appropriate storage is available at 
your installation. 
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Description 



Basic output ; 



This program can be used to obtain the following measures for 

# 

a group and the elements or individuals of the group: 

!• a distance matrix indicating the shortest 
path from any person to any other person 

2* an ultiraate connectedness matrix indicating 
if two people are in any way connected to 
each other 

3, linkages for each individual at each remove 

4, aggregate measures characterizing the choice 
netv/ork for each individual as well as the network 
of choices received by that person, termed forward 
and backward connectedness, respectively 

4» a distribution of the number of choices (and times 
chosen) and summary statistics for each person 
by remove 

5, Coleman’s measure of ultimate connectedness 
(percent of possible connections) 

6* summary statistics of both the rows (choices) and 
columns (times chosen) of the entire distance matrix 



Optional features : 

CHAIN can also be used to perform operations on the input 
data before analysis* The following options may be performed 

1. deletion of non-reciprocated choices 

2. addition of non~reciprocated choices 

3. punch out a reformatted input deck, i*e*, if 
the input deck contains duplicate choices and 
embedded blanks or zeros the CHAIN program will 
eliminate them, left adjusting the choices and 
output a new deck 
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Algorithm ; 



Given person i and his k sociornetric selection, » 

a chain is generated in the following manner: 

a. consider the k choices of person i as direct choices 
( 1 remove) 

b. look at the k choices of each of the previous k chosen 
persons; if they are represented at the first remove 
then we ignore them, if not they are considered to be 
2 removes from i 

c. repeat step b for the new choices generated until there 
are no longer any new choices in i*s chain; the new 
choices at the i-th cycle represent choices i+1 removes 
away from person i 

d. record the constructed chain and move to the next person 

d. repeat until all persons in the group have been analyzed 
producing the distance matrix of shortest paths from 
person i to j, Summary statistics are produced from this 
matrix 



Coding of the data; 

Each individual responding to the sociornetric item and each 
person selected by him must be given unique integer identifi- 
cation number. The input data deck must contain a card(s) for 
each person numbered even if he makes no choices. That card(s) 
would contain only am ID number in the appropriate columns. 

The data card(s) for each person in the group to be analyzed 
must contain the following information in the following sequence 

a, a previous or old ID number, if no such number 
exists, i,e,, if the deck is already serially 
numbered, the user must leave several columns blank 
at the beginning of the card. This field is used by 
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the program in punching statistical output so 
that it may be correlated with other data 
available for the subjects 

b. the second field should be the person's sequential 
ID number and the data should be. input in this 
sequence l,..*,n 

c, the remaining fields are .the ID numbers of the 
people he chooses, recoded according to the 
sequential list 



The number of cards per person may vary since the user indicates 
to the program the format of the input data. Again, all persons 
as input to CUMN must be numbered serially. The largest ID 
number must correspond to the number of people as input, A choice 
of person i who makes no choices himself must have his o\m input 
card, 

t 

Organization of the data sets ; 

Multiple groups are allowed, the entire CHAIN routine being 
repeated for each group. The first card of any data set (here 
we mean the entire data input to CHAIN including control cards) 
indicates the n\mber of groups which the program is to analyze, 
the number is punched right justified in columns 1 to 4, This 
card is used only once and preceeds all groups and their control 
cards. The data input for each group follows this data set card 
but each group must conform to the follov/ing organization: 

1, Title card 

2, Control card(s) 

3, Format card(s) 

4, Data Cards 

♦ 
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yitle card : Alpha-numeric information punched in columns i-80 
of this card will be used to label the output. If no label 
is desired a blank card must be inserted. Blanks may occur 
in the title card. 



O 






2. Control card (s): (all numbers are punched right justified) 



CARD 1 Column( s) 

3- 5 



9-10 



Description 

Number of people in the group 
to be analyzed, may vary to 999 
but may not exceed that figure, 
must be punched on card 

Number of sociometric choices per 
person, may vary to 19 but cannot 
exceed this figure, roust be punch- 
ed on card 



15 



20 



25 



30 



35 



2sdata is to be symmetricized by 
adding non-reclprocated choices 

3*sdata is to be symmetricized by 
deleting non-reciprocated choices 

0 ,BLAKK«=neither of the above is 
desired, raw data used 

lanew deck is desired after re- 
formating has been done 

0,BLANKsnew deck is not desired 

Isseach person' s chains are to be 
written out ( Warning ; this option 
can produce erccessive amounts of 
printed output depending on the 
size of the group and its density) 

0 , BLANKschains not desired 

Isuser wants connectedness matrix 
(i,j-th cell is printed '‘X" if i 
and j tire ultimately connected, 
if not) 

0 , BLANKssconnectedness not desired 

Isuser wants distance matrix 

0 ,BLANl\sdi stance matrix not desired 
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Colujfin ( s ) 

ho 



1I1..45 



50 
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56-65 



Description 

l=all rovrs of matrix are to be outpui'.ed 
before next co 3 .iuim 

0 ,BT 4 Al‘iK-matri:{ form of output for eit}icr 
connectedness or distance matrix 
desired 

Length of longest indirect choice clmin 
to be used, integer, right justified, 
should not be larger than number of 
people in group 

BLAI'3K=default value of N, the size of 
the group 

l=piinched card or vrritten (on users 
taps) statistical output desired; old 
ID is viritten before sequential ID used 
by program 

% 

0, BL^KK=statistical data not desired 

Indicates whether or not user desires 
to change l/O logical unit numbers for 
use during execution’ (with the exception 
of the card reader =5 and printer =6 vrhich 
must be changed in the source deck) 

l=sorae or all optional device numbers are 
to be changed 

0,BLA!1K=defavilt values are to be used 

If user indicates a change is desired in 
one or more of the optional unit device 
numbers he must insert CABD 2, if not, 
this second card is omitted 

Attenuation constant used in statistical 
measures; digits to the right of column 
60 assume fractional value, no decimal 
point is needed; 

0,BLAKIC=default value of 0,5000 

NOTE: unless there exist good reason for 
not doing so, it should be fractional 
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Column( s) 



Description 



CARD 2 

(optional device card- 
not to be inserted if 
column 55 of first 
control card is blank) 

3- 4 



7- 8 



11-12 

15-16 



19-20 



f 



Scratch tape or disk for recording 
connectedness matrix 

DEFAULT VALUE® 1 

Scratch tape or disk for recording 
distance matrix 

DEFAULT VALUE® 2 

Input device for data 

DEFAULT V^iLUE® 5 card reader 

Statistical output device (if 
column 45 of card one is a 1) 

DEFAULT VALUE® 7 card punch 

Output device for reforraated deck 

DEFAULT VALUE® 7 card punch 



The data on these units is written out in an unformatted 
mode. The JCL for these units should be variable blocked. 

The amount of data written on these units varies with the number 
of people being analyzed. This fact is important when disks 
are used as scratch units. For less than 200 people with 5 
choices per person, SP7iCE®( 3620/ 100 ,100) ) , should suffice"i^* 

The most space used was for 900 people with 15 choices ,• 

(this group was very dense) and amounted to 400 tracks, i.e., 
SPACE®(3620/400,100) ) 



3. Format Card(s) ; Only one format card is required if no 

new deck is to be punched. This format card indicates 
the arrangement of the data to be inputed. Columns 
1-80, left justified, on this card may be used. The 
first field specified must be for the old ID number, 
the second for the serialized ID number that is used 
by this program for processing, the remaining fields 
represent the person's choices. All fields must be 
integer. 
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Example 1: No old Id is available, serial ID is in columns 3—5, 
five choices, each in 4 columns beginning in column 8 

(I2,I3»2X,5I4) 



Example 2: Old ID in col 1-4, new ID in col 8-10 and 14 choices 

nine of 3 columns and five of 4 columns starting in 
column 11 

(I4,3X,10I3/5I4) 



Second format card: used only if a new deck option has been 

indicated on the program control card. This format 
card specifies the way in v;hich the user wants the 
reformated deck punched out. The program does not 
output an old ID number v;hen punching a new deck. 

The number of outputed items is, therefoi*e, one less 
than that inputed. 



4, Input data : Data may be on cards read in from tape or disk. If 
either of the latter two options are desired the correct 
column must pe punched 1 on the program control card and the 
correct device must be indicated on the second control 
ca3.*d. Data must conform to the requirements set earlier 
and must be in the correct order. If not, the program will 
detect an error and cancel; all suceeding program decks will 
be flushed. 



References: 

Coleman, James, Introduction to Mathematical Sociology . 

Free Press, 1964, Chapter 14 
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A Nets on Further Use of the Distance and Connectedness Matrices 



By specifying the optional tape or disk devices the user can 
record the distance and/or connectedness matrix for further analysis* 
The method for indicating the use of user tapes or disks rather 

than scratch disk^fe Is described in the section on organization of 

! 

data sets* However, it is necessary that the user be familiar with 
the format of this output in order to utilize it correctly* 

Both the distance and connectedness matrices are written with- 
out the use of format control, i*e*, in binary mode* In order to 
read them the user must use, if he is using a fortran program, an 
unformatted read statement* The use of such statements is described 
in the IBM publication, “Programmers Guide to Fortran*” 

Distance Matrix ; 

The format of the distance matrix is as follows: each row is 
written out serially* The first entry of a record is the row 
number followed by a vector of length N, where N is the number of 
persons in the group* Each entry in this vector represents a value 
from 1 to N (or if the user has specified a maximum chain length, 
a distance from 1 to the maximum) , which is the distance that 
person i (if we are on the ith row) is from person j (the jth cell 
of the row vector)* Hence, the user must first read the row number 
and then a vector of length N* 

REV/IND 8 

DO 100 K=1,N 

READ( 8 ) NROV/ , ( NARRAY( I ) , I«1 , N ) 

(usSr statements) 

100 CORTINOE 
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In the example above, the data is on logical unit 8, NROW is the 
row number (chooser) and NARRhY(I) is the distance of the other 
persons from person NRO//* 



% 

V • 

Connectedness matrix: 

The connectedness matrix, on logical unit 1, unless the user 
specifies another device number, is formated as follows; Record A - 
the number of choices of the ith person (all removes) « NTOTj 
Record B «• the number of the ith person and WTOT entries, each being 
an ID number of a person who is chosen by person I* 



REWIND 10 
DO 100 K«1,N 
READ (10) NTOT 

READ ( 10 ) I , ( NARRi\Y( KK ) , NK«1 , NTOT) 

# 

(user statements) 

100 CONTINUE 



In the example above, each entry in NARRAY will be an ID number 
(there are NTOT such entries) and each number identifies a choice 
of person I. The ID*s are the sequential or new ID's. 
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J03 CONTROL LANGUAGE (JCL) 
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Example 1: Source dock is used. Input data on card reader. 
Scratch units not varied. 

//JOB ... ... ,REGION«300K 

/*PORI-uVT PR , DDNAME-FT06P001 , OVFL*ON 

// J3XEC PORTGCLG, PARK. FORT ,TIME,GOsnn 

//FORT. SYS IN DD * 

♦ 

# 

(Source deck) 

/* 

//GO.FTQIPOOI DD SPACE»( 3620 , (100,100) ) 
//GO.FT02P001 DD SPACE«( 3620, (100,100) ) 

//GO. SYS IN DD * 

f. 

(Data set(s)) 

♦ 



Example 2; Object deck, input data) on logical unit 8 (tape), 
scratch unit 1 (connectedness matrix), on logical 
unit 11 (tape), scratch unit 2 (distance matrix) on 
logical unit 12 (tape) 



//JOB ,REGION«300K 

/★FORl'LAT PR , DDNiU4EsFT06F001 , OVFL»ON 
/♦SETUP... 

/♦SETUP ...( RING on units 11 and 12, NORING on 8) 

/♦SETUP . . . 1 

// EXEC FORTGLG, V 

//LKED.SYSIN DD ♦ 

(object deck) 

/* 

//GO.FT08P001 DD UNIT»2400-9 ,DISP«OLD,VOL*SER»xxxxxx, 

// DCB= ( RECFM«FB , LRECL-80 , BLKSI 2E«80 ) , DSN«INP.UT , 

// LABEL=(n,SL) 

//GO.PTllFOOl DD UNITs2400-9,DISPeNEW,VOL*SER«XXXXXX, 

// DCB=( RECFM«V, BLiCSIZE*3620 ) , DSN«CONNECT ,LABEL«( n ,SL) 

//GO.FT12F001 DD U14IT»2400-9 ,DISPsNEU,VOL«SER*XXXXXX, 

// DCB=(RECPM=V,BLKSIZE«3620) ,DSN«DISTAN,LABELa(n,SL) 
//GO. SYS IN DD ♦ 

O 

(data set(s)) ^ 
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